Skip to main content

Home/ Advanced Concepts Team/ Group items matching ""data mining"" in title, tags, annotations or url

Group items matching
in title, tags, annotations or url

Sort By: Relevance | Date Filter: All | Bookmarks | Topics Simple Middle
Luís F. Simões

Pattern | CLiPS - 2 views

  • Pattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks).
  •  
    Intuitive, well documented, and very powerful. A library to keep an eye on. Check the example Belgian elections, June 13, 2010 - Twitter opinion mining
Thijs Versloot

The big data brain drain - 3 views

  •  
    Echoing this, in 2009 Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira penned an article under the title The Unreasonable Effectiveness of Data. In it, they describe the surprising insight that given enough data, often the choice of mathematical model stops being as important - that particularly for their task of automated language translation, "simple models and a lot of data trump more elaborate models based on less data." If we make the leap and assume that this insight can be at least partially extended to fields beyond natural language processing, what we can expect is a situation in which domain knowledge is increasingly trumped by "mere" data-mining skills. I would argue that this prediction has already begun to pan-out: in a wide array of academic fields, the ability to effectively process data is superseding other more classical modes of research.
Luís F. Simões

When Astronomy Met Computer Science | Cosmology | DISCOVER Magazine - 1 views

  • “That’s impossible!” he told Borne. “Don’t you realize that the entire data set NASA has collected over the past 45 years is one terabyte?”
  • The LSST, producing 30 terabytes of data nightly, will become the centerpiece of what some experts have dubbed the age of peta­scale astronomy—that’s 1015 bits (what Borne jokingly calls “a tonabytes”).
  • A major sky survey might detect millions or even billions of objects, and for each object we might measure thousands of attributes in a thousand dimensions. You can get a data-mining package off the shelf, but if you want to deal with a billion data vectors in a thousand dimensions, you’re out of luck even if you own the world’s biggest supercomputer. The challenge is to develop a new scientific methodology for the 21st century.”
  •  
    Francesco please look at this and get back wrt to the /. question .... thanks
jmlloren

Data.gov - 2 views

shared by jmlloren on 24 Feb 10 - Cached
  •  
    Interesting databases to play with data mining algorithms
Luís F. Simões

HP Dreams of Internet Powered by Phone Chips (And Cow Chips) | Wired.com - 0 views

  • For Hewlett Packard Fellow Chandrakat Patel, there’s a “symbiotic relationship between IT and manure.”
  • Patel is an original thinker. He’s part of a group at HP Labs that has made energy an obsession. Four months ago, Patel buttonholed former Federal Reserve Chairman Alan Greenspan at the Aspen Ideas Festival to sell him on the idea that the joule should be the world’s global currency.
  • Data centers produce a lot of heat, but to energy connoisseurs it’s not really high quality heat. It can’t boil water or power a turbine. But one thing it can do is warm up poop. And that’s how you produce methane gas. And that’s what powers Patel’s data center. See? A symbiotic relationship.
  • ...1 more annotation...
  • Financial house Cantor Fitzgerald is interested in Project Moonshot because it thinks HP’s servers may have just what it takes to help the company’s traders understand long-term market trends. Director of High-Frequency Trading Niall Dalton says that while the company’s flagship trading platform still needs the quick number-crunching power that comes with the powerhog chips, these low-power Project Moonshot systems could be great for analyzing lots and lots of data — taking market data from the past three years, for example, and running a simulation.
  •  
    of relevance to this discussion: Koomey's Law, a Moore's Law equivalent for computing's energetic efficiency http://www.economist.com/node/21531350 http://hardware.slashdot.org/story/11/09/13/2148202/whither-moores-law-introducing-koomeys-law
Luís F. Simões

Kaggle: Crowdsourcing Data Modeling - 2 views

  • Kaggle is an innovative solution for statistical/analytics outsourcing. We are the leading platform for data modeling and prediction competitions. Companies, governments and researchers present datasets and problems - the world's best data scientists then compete to produce the best solutions. At the end of a competition, the competition host pays prize money in exchange for the intellectual property behind the winning model.
Marcus Maertens

Unsupervised word embeddings capture latent knowledge from materials science literature | Nature - 1 views

  •  
    New results in NLP might allow to automate scientific discoveries by data mining of papers. Work considers 3.3M abstracts from material science, physics and chemistry and claims to discover new materials before they are published later on.
  •  
    ACT did that from diigo post digging in the retreat of 2014! Still without NLP.
  •  
    That's cool! Didn't know.
Luís F. Simões

In Head-Hunting, Big Data May Not Be Such a Big Deal - NYTimes.com - 1 views

  • Years ago, we did a study to determine whether anyone at Google is particularly good at hiring. We looked at tens of thousands of interviews, and everyone who had done the interviews and what they scored the candidate, and how that person ultimately performed in their job. We found zero relationship. It’s a complete random mess
jcunha

Data scientists find connections between birth month and health - 4 views

  •  
    Seems like astrologists were somehow right... Ptolemy would be proud.
  •  
    Greetings from July :-) On an unrelated note... this chart made me suddenly realise I've been always thinking of the year as passing counter-clockwise and starting at the bottom. Very strongly. Seems like some tempo-spatial association. Anybody has a similar feeling?
Daniel Hennes

Discovery with Data: Leveraging Statistics with Computer Science to Transform Science and Society - 3 views

  •  
    Responding to calls from the National Science Foundation (NSF) and White House Office of Science and Technology Policy (OSTP), a working group of the American Statistical Association has developed a whitepaper detailing how statisticians and computer scientists can contribute to administration research initiatives and priorities. The whitepaper includes a lot of topics central to machine learning and data mining, so please take a look.
  •  
    I guess Norvig is trumping Chomsky big time if this is the attitude of the NSF :)))
LeopoldS

The Moon's mantle unveiled - 2 views

  •  
    first science results reported in Nature (as far as I know) from the Yutu-2 and Chang'e mission .... and they look very good!
  •  
    Sure they are very useful! It will be even better if they manage to fit the data to modeled circulation of the lunar magma ocean that was formed posterior to the "Theia" body collision with Earth. The collision was the cause of the magma ocean in the first place. The question now is how this circulation pattern of the lava-moon "froze" in time upon phase transition to solid. Because, what crystallizes last in sequence, is more rich in "incompatible" with the crystal structure, elements, we might combine data+models to predict their location. Those incompatible tracers are mainly radioactively decaying elements that produce heat (google publications about lunar KREEP elements (potassium (K), rare earth elements(REE), and phosphorus(P)). By knowing where the KREEP is: - we know where to dig for them mining (if they are useful for something, eg. Phosphorus for plants to be grown on the Moon) - we avoid planning to build the future human colony on top of radioactives, of course. The hope is that the Moon, due to lack of plate tectonics, has preserved this "signature of the freezing sequence". Let's see.
  •  
    thanks Nasia! very interesting comment
Dario Izzo

Kaggle: making data science a sport - 2 views

  •  
    Old post from Luis brought back from graveyard..... At least two good ideas to put there: 1) tipping points prediction 2) planetary phases for trajectory transfer and probably many more if we think about it a bit more
Luís F. Simões

Mapping Dark Matter Case Study - Kaggle - 3 views

  • Mapping Dark Matter competition to encourage the development of new algorithms that can measure the way dark matter causes tiny distortions in images of galaxies by changing their ellipticity, or how their shapes are stretched.
  •  
    Blog posts describing the approaches followed by the contestants that ranked 1st, 2nd and 3rd.
Athanasia Nikolaou

Nature Paper: Rivers and streams release more CO2 than previously believed - 6 views

  •  
    Another underestimated source of CO2, are turbulent waters. "The stronger the turbulences at the water's surface, the more CO2 is released into the atmosphere. The combination of maps and data revealed that, while the CO2 emissions from lakes and reservoirs are lower than assumed, those from rivers and streams are three times as high as previously believed." Alltogether the emitted CO2 equates to roughly one-fifth of the emissions caused by humans. Yet more stuff to model...
  • ...10 more comments...
  •  
    This could also be a mechanism to counter human CO2 emission ... the more we emit, the less turbulent rivers and stream, the less CO2 is emitted there ... makes sense?
  •  
    I guess there is a natural equilibrium there. Once the climate warms up enough for all rivers and streams to evaporate they will not contribute CO2 anymore - which stops their contribution to global warming. So the problem is also the solution (as always).
  •  
    "The source of inland water CO2 is still not known with certainty and new studies are needed to research the mechanisms controlling CO2 evasion globally." It is another source of CO2 this one, and the turbulence in the rivers is independent of our emissions in CO2 and just facilitates the process of releasing CO2 waters. Dario, if I understood correct you have in mind a finite quantity of CO2 that the atmosphere can accomodate, and to my knowledge this does not happen, so I cannot find a relevant feedback there. Johannes, H2O is a powerful greenhouse gas :-)
  •  
    Nasia I think you did not get my point (a joke, really, that Johannes continued) .... by emitting more CO2 we warm up the planet thus drying up rivers and lakes which will, in turn emit less CO2 :) No finite quantity of CO2 in the atmosphere is needed to close this loop ... ... as for the H2O it could just go into non turbulent waters rather than staying into the atmosphere ...
  •  
    Really awkward joke explanation: I got the joke of Johannes, but maybe you did not get mine: by warming up the planet to get rid of the rivers and their problems, the water of the rivers will be accomodated in the atmosphere, therefore, the greenhouse gas of water.
  •  
    from my previous post: "... as for the H2O it could just go into non turbulent waters rather than staying into the atmosphere ..."
  •  
    I guess the emphasis is on "could"... ;-) Also, everybody knows that rain is cold - so more water in the atmosphere makes the climate colder.
  •  
    do you have the nature paper also? looks like very nice, meticulous typically german research lasting over 10 years with painstakingly many researchers from all over the world involved .... and while important the total is still only 20% of human emissions ... so a variation in it does not seem to change the overall picture
  •  
    here is the nature paper : http://www.nature.com/nature/journal/v503/n7476/full/nature12760.html I appreciate Johannes' and Dario's jokes, since climate is the common ground that all of us can have an opinion, taking honours from experiencing weather. But, the same as if I am trying to make jokes for material science, or A.I. I take a high risk of failing(!) :-S Water is a greenhouse gas, rain rather releases latent heat to the environment in order to be formed, Johannes, nice trolling effort ;-) Between this and the next jokes to come, I would stop to take a look here, provided you have 10 minutes: how/where rain forms http://www.scribd.com/doc/58033704/Tephigrams-for-Dummies
  •  
    omg
  •  
    Nasia, I thought about your statement carefully - and I cannot agree with you. Water is not a greenhouse gas. It is instead a liquid. Also, I can't believe you keep feeding the troll! :-P But on a more topical note: I think it is an over-simplification to call water a greenhouse gas - water is one of the most important mechanisms in the way Earth handles heat input from the sun. The latent heat that you mention actually cools Earth: solar energy that would otherwise heat Earth's surface is ABSORBED as latent heat by water which consequently evaporates - the same water condenses into rain drops at high altitudes and releases this stored heat. In effect the water cycle is a mechanism of heat transport from low altitude to high altitude where the chance of infrared radiation escaping into space is much higher due to the much thinner layer of atmosphere above (including the smaller abundance of greenhouse gasses). Also, as I know you are well aware, the cloud cover that results from water condensation in the troposphere dramatically increases albedo which has a cooling effect on climate. Furthermore the heat capacity of wet air ("humid heat") is much larger than that of dry air - so any advective heat transfer due to air currents is more efficient in wet air - transporting heat from warm areas to a natural heat sink e.g. polar regions. Of course there are also climate heating effects of water like the absorption of IR radiation. But I stand by my statement (as defended in the above) that rain cools the atmosphere. Oh and also some nice reading material on the complexities related to climate feedback due to sea surface temperature: http://journals.ametsoc.org/doi/abs/10.1175/1520-0442(1993)006%3C2049%3ALSEOTR%3E2.0.CO%3B2
  •  
    I enjoy trolling conversations when there is a gain for both sides at the end :-) . I had to check upon some of the facts in order to explain my self properly. The IPCC report states the greenhouse gases here, and water vapour is included: http://www.ipcc.ch/publications_and_data/ar4/wg1/en/faq-2-1.html Honestly, I read only the abstract of the article you posted, which is a very interesting hypothesis on the mechanism of regulating sea surface temperature, but it is very localized to the tropics (vivid convection, storms) a region of which I have very little expertise, and is difficult to study because it has non-hydrostatic dynamics. The only thing I can comment there is that the authors define constant relative humidity for the bottom layer, supplied by the oceanic surface, which limits the implementation of the concept on other earth regions. Also, we may confuse during the conversation the greenhouse gas with the Radiative Forcing of each greenhouse gas: I see your point of the latent heat trapped in the water vapour, and I agree, but the effect of the water is that it traps even as latent heat an amount of LR that would otherwise escape back to space. That is the greenhouse gas identity and an image to see the absorption bands in the atmosphere and how important the water is, without vain authority-based arguments that miss the explanation in the end: http://www.google.nl/imgres?imgurl=http://www.solarchords.com/uploaded/82/87-33833-450015_44absorbspec.gif&imgrefurl=http://www.solarchords.com/agw-science/4/greenhouse--1-radiation/33784/&h=468&w=458&sz=28&tbnid=x2NtfKh5OPM7lM:&tbnh=98&tbnw=96&zoom=1&usg=__KldteWbV19nVPbbsC4jsOgzCK6E=&docid=cMRZ9f22jbtYPM&sa=X&ei=SwynUq2TMqiS0QXVq4C4Aw&ved=0CDkQ9QEwAw
Kevin de Groote

WordItOut - Transform your text into word clouds! - 4 views

  •  
    Transform your text into word clouds! Word clouds are a fun way to show words, where the most important ones are bigger than the others. Make and share word clouds from any text with WordItOut!
  • ...1 more comment...
  •  
    WOW!!!!
  •  
    Impressive. Really nice work!!!
  •  
LeopoldS

TPAC - Technology Policy and Assessment Center at the Georgia Institute of Technology - 2 views

  •  
    Francesco could you please have a look at this? technology behind? semantic? useful also for us?
Luís F. Simões

Robot biologist solves complex problem from scratch - 1 views

  • Ref.: Michael D Schmidt, et al., Automated refinement and inference of analytical models for metabolic networks, Physical Biology, 2011; 8 (5): 055011 [DOI: 10.1088/1478-3975/8/5/055011]
  •  
    The latest from Schmidt / Lipson / Eureqa. A significant improvement over their previous work is that now "The algorithm selects between multiple candidate models by designing experiments to make their predictions disagree."
1 - 17 of 17
Showing 20 items per page